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A 12-Channel Real-Time GPS Software Receiver 

B.M. Ledvina, M.L. Psiaki, S.R Powell, and P.M. Kintner 

12.12.2002 

Abstract 

A GPS receiver has been developed that runs 12 tracking channels in real-time using 
a software correlator. This work is part of an effort to develop a flexible receiver that 
can use new GPS signals as they become available without the need for new correlator 
hardware. The receiver consists of an RP front-end, a system of shift registers, a 
digital data acquisition (DAQ) card, and software that runs on a 1.73 GHz PC. The 
commercial RP front-end down converts the signal into a 2-bit digital data stream at 
5.714 MHz. The shift registers parallelize the magnitude and sign data bit streams 
into separate words, which the DAQ reads into the PC's memory using direct memory 
access. The PC performs base-band mixing and PRN code correlations in a manner 
that directly simulates a hardware digital correlator. It also performs the usual signal 
tracking and navigation functions, under the control of a real-time Linux operating 
system. 

The software correlator receives frequency commands for simulated carrier and code 
NCOs and, in effect, uses these to reconstruct carrier and code replicas which it mixes 
with the input data stream. The resulting signals are summed to produce the standard 
in-phase and quadrature, prompt and early-minus-late accumulations. These, along 
with the phases of the 2 NCOs, are sent back to the part of the code that executes 
the tracking loops and the navigation functions. The contributions of this work are a 
set of special high-speed algorithms for doing the correlations in software. They make 
use of bit-wise parallelism so that a single C-code command (partially) processes 32 
samples at a time. 

This system has been tested using a roof-mounted antenna. When operating with 
12 channels, the entire receiver uses less than 50% of the capacity of the 1.73 GHz 
processor and navigates to an accuracy of 10 meters. 
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1 Introduction 

A real-time software receiver architecture can provide GPS user equipment with operational 
flexibility that will prove more and more useful as time goes by. The current GPS system is 
slated to expand its capabilities to include new civilian codes on the L2 frequency and a new 
L5 frequency. A receiver that uses a hardware correlator will require hardware modifications 
in order to use these new signals. In the near term, a receiver designer will be faced with 
a complex trade-off in order to decide whether the extra complexity is worth the improved 
performance that will accrue only very slowly as new GPS satellites replace older models. 
A software receiver can use new signals without the need for a new correlator chip. New 
frequencies and new pseudo-random number (PRN) codes can be used simply by making 
software changes. Thus, software receiver technology will lessen the risks involved for de- 
signers during the period of transition to the new signals. Furthermore, a software receiver 
could be reprogrammed to use the Galileo system, GLONASS, or both, which provides an 
added benefit from the use of a software radio architecture. Thus, there are good reasons to 
develop practical real-time software GPS receivers. 

A GPS receiver can be broken down into various components (see Figure 1). First, an 
antenna, possibly followed by a pre-amp, receives the Lrband GPS signals. After the antenna 
comes an RF section that filters and down converts the GHz GPS signal to an intermediate 
frequency in the MHz range. The RF section also digitizes the signal. The next section is the 
correlator chip that separates the signal into different channels allocated to each satellite. A 
modern receiver has 10 or more channels. For each satellite, the correlator mixes the Doppler 
shifted intermediate frequency signal to base-band and correlates it with a local copy of a 
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Figure 1: A typica/ GPS receiver twi/i special purpose hardware and general hardware sepa- 
rated. 
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Figure 2: A typical GPS software receiver showing the separation between special purpose 
hardware and general hardware. 



PRN code. The final components of the receiver involve software routines that track the 
signals, demodulate the navigation message, and compute the navigation solution. 

A software receiver differs from a standard receiver in one very distinct way (see Figure 
2). The functions of the correlator chip are moved to software running on a general purpose 
processor. Doing so changes the components and layout of the receiver. The RF front-end 
is repackaged into a device called a bit-grabber, which outputs a binary bit-stream. A data 
buffering and acquisition system reads the bit-stream into a computer. The bit-stream is 
then available for processing by a software correlator running on the PC's processor. 

The notion of a software GPS receiver has been around for several years. In the recent 
past, GPS software receivers have been developed that either post-process stored signals or 



operate in real-time. Previous real-time software receivers function with a limited number of 
channels (4-6) and require high-end computer speeds or DSP chips [Akos et a/., 2001a and 
Akos et a/., 2001b]. The work presented in this paper improves upon these previous works 
in two ways. First, the software receiver discussed here is approximately 3 times faster 
than the results presented in Akos [2001a], thus enabling a 12-channel receiver. Second, this 
paper fully explains the algorithms used to compute the correlation accumulations that are 
required for acquisition and tracking. 

The remaining portions of this paper explain the internal workings of a 12-channel real- 
time GPS software receiver and present experimental performance results for this system. 
The second section describes the hardware including the bit-grabber and PC. The third 
section reviews the structure of spread-spectrum signals and methods of acquisition and 
tracking. The next section presents a short description of an existing PC-based GPS re- 
ceiver that has been modified to develop the present receiver. The fifth section presents an 
overview of the software correlator design. Section 6 gives a lengthy description of the math- 
ematics behind base-band mixing and correlation leading into the implementation used in 
the software correlator. Section 7 discusses timing and measurements made by the software 
correlator. Section 8 describes how to keep all of the calculations in integer format. Section 
9 presents some performance results. Section 10 gives a summary and concluding remarks. 

2 System Configuration 

Central to the software GPS receiver is the personal computer. The current system consists of 
a PC with a 1.73 GHz AMD Athlon processor running the RT-Linux operating system. RT- 
Linux is a hard real-time variant of Linux implemented as a set of patches to the standard 
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Linux kernel. Due to its real-time optimized design, RT-Linux provides very low latency 
interrupt responsiveness along with the ability to execute threads at regular intervals. This 
translates into a highly efficient and responsive operating system that reliably executes time 
critical code. An additional feature of RT-Linux is that it keeps the functionality of Linux 
by running the kernel as the lowest priority thread. By retaining the functionality of Linux, 
it is very easy to develop, test, debug, and run real-time software. Another benefit of using 
Linux is that tools such as drivers, a C complier, and text editors are readily and freely 
available. 

The next component of the software receiver is the bit-grabber. The bit-grabber consists 
of a Mitel GP2015 RF front-end. The front-end down converts the nominal 1.57542GHz GPS 
signal to an intermediate frequency of (88.54/63) x 10 6 Hz = 1. 4053968254MHz and then 
performs analog-to-digital conversion. The resultant, digitized signal has two binary bits 
per sample corresponding to a sign and a magnitude. The possible values for the digitized 
signal are ±1 and ±3. Table 1 shows how to convert the binary sign and magnitude bits into 
integer values. The two binary bits are available as outputs from the bit-grabber. In order 
to provide accurate timing, the sign and magnitude bits are synchronized to a (40/7) x 10 6 
Hz = 5.714 MHz clock signal, which is the third output from the bit-grabber card. 

Another type of bit-grabber which uses a direct ADC down conversion implementation 
has also been used. The heart of this bit-grabber is a very fast analog-to-digital converter 
that can deal with an input bandwidth of up to 2 GHz and that can perform 8-bit conversion 
at continuous sample rates up to 1 GHz. The ADC samples the GPS signal at 5.714 MHz, 
which aliases the LI carrier down to a nominal frequency of = 1.722 MHz. Each 8-bit 
sample gets processed by a separate logic unit to create sign and magnitude pairs with an 
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appropriate input gain that minimizes the signal-to-noise ratio's digitization loss. 



Sign Mag 


Value 


0 0 


-1 


0 1 


-3 


1 0 


+1 


1 1 


+3 



Table 1: Sign and magnitude combinations of the input GPS signal 

A data acquisition system reads the digitized sign and magnitude bits from the bit- 
grabber into the PC. To make the process of reading data into the PC more efficient and to 
prepare for efficient correlation calculations, the DAQ card reads 32 bits of buffered samples 
at a time. The 32 bits consists of 16 sign bits and 16 magnitude bits. A series of shift 
registers buffer the data, packing the sign and magnitude bits into separate 16-bit words. 
A divide-by-16 counter converts the 5.714MHz clock down to 357.14KHz, which provides a 
signal indicating when the buffer is full. 

The data acquisition system consists of a PC card and driver software. The card is a 
National Instruments PCI-DIO-32HS digital I/O card. Pertinent features of this card are the 
32 digital input lines, direct memory access (DMA) and availability of a driver for RT-Linux. 
A suite of open source drivers and application interface software for DAQ cards known as 
COMEDI (COntrol and MEasurement and Device Interface) is freely available. COMEDI 
provides Linux/RT-Linux support for nearly one hundred DAQ cards spanning numerous 
manufacturers. One of the strong points of COMEDI is that it includes very general drivers, 
which are easily modifiable for specific applications. The demands of the software receiver 
necessitate certain modifications to the stock COMEDI driver for the PCI-DIO-32HS card. 
The modifications include increasing the number of input bits from 16 to 32, enabling DMA, 
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and modifying the driver to support continuous interrupt-driven acquisition. 

The software receiver is written entirely in C-code using tools available from standard 
Linux distributions. To promote portabilility of the software, no processor-specific assembly 
language or special instructions are used. 

3 Review of the GPS Spread Spectrum Signal and Re- 
ceiver Correlations 

The received time-domain LI coarse/acquisition (C/A) signal that gets output by the RF 
front-end is: 

where U is the sample time, Aj is the amplitude, D jk is the navigation data bit, Cj[t] is the 
C/A code, Tj k and r jk +i are the start times of the and C/A code periods, ujtf is the 
intermediate image of the Ll carrier frequency, <f>j(ti) is the carrier phase perturbation due 
to accumulated delta range, n ; - is the receiver noise, and the subscript j refers to a particular 
GPS satellite. The summation is over all visible GPS satellites. The negative sign in from of 
<p{ti) comes from the high-side mixing that occurs in the RF front-end that has been used. 

A GPS receiver works with correlations between the received signal and a replica of it. 
The correlations are used to acquire and track the signal. The replica is composed of two 
parts, the carrier replica and the C/A code replica. Two carrier replica signals are used, an 
in-phase signal and a quadrature signal. When mixed with the code replica they form the 
in-phase and quadrature replicas: 
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Vij (U) = Ci [o.OOl ( ^I^,J ] cosWifU - + 0) Doppjk {t { - *,*)]} (2) 

VQiiU) = -C; ^0.001 ( ^~^ )]^ w HFt< - + w Dwifc (ti - r iJk )]} ^ 

where equations (2) and (3) apply during the A;** C/A code period. In these equations fj k 
and fyjt+i are the receiver's estimates of the start times of the k^ and k + 1 st code periods, 
<j>j k is the estimated carrier phase at time ij k , and u> DopP j k is the estimated carrier Doppler 
shift during the k^ code period. 

A typical receiver computes the estimates fy fcj f jfc+ll <£ jJb , and u) Dopp j k by various means 
that are described in [Van Dierendonck, 1996]. These include open-loop acquisition methods 
and closed-loop signal tracking methods such as a delay-locked loop to compute fj k and 
fjk+i and a phase-locked loop or a frequency-locked loop to compute (j> jk and WDoppjk- The 
software receiver developed here uses standard techniques for forming these estimates. These 
techniquires are not discussed in detail here. 

The receiver uses the carrier and code replicas to compute the following in-phase and 
quadrature correlation accumulations: 



= 'if yfe)C, [o.ooi ( % +A y^ cosiuiFti - [4> jk + u, Dmjk {U - f jk )]} (4) 

QM*) = - *£* vMCi fo.ooi ( i+A^a )]^^ _ ^ + & Doppjk ( ti - 

(5) 
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where i* is the index of the first RF front-end sample time that obeys fy* < tj 4 and Nk + l 
is the total number of samples that obey fj k < U < f ;Jt +i. The time offset A causes the 
replica PRN code to play back early if it is positive and late if A is negative. One of the 
main contributions of the present work is developing an efficient technique for the receiver 
to accumulate J,* and Q ; * in software. 

4 Use of Previously Existing GPS Receiver Software 

Previous work is important to the implementation of this real-time GPS software receiver. 
The Mitel GPSArchitect GPS receiver was ported to RT-Linux [Ledvina et a/., 2000] and is 
herein referred to as Cascade. The Cascade GPS software coupled with the Mitel chipset 
(GP2015 RF front-end and GP2021 correlator) on an ISA card forms a GPS receiver for the 
PC. Since Cascade provides standard GPS functions (signal tracking, data demodulation, 
navigation solution, etc.) and is designed to interact with the GP2021, it is included as 
part of the real-time software receiver. Thus, no new developments have been needed for 
standard receiver functions such as code and carrier frequency steering during acquisition 
and tracking. 

The software correlator is an independent RT-Linux module. The interface for interacting 
with this module has been designed to be similar to that of a hardware correlator. In fact, 
the software correlator closely mimics the GP2021 correlator in order to allow the Cascade 
GPS software to be merged with the software correlator. When using a hardware correlator, 
the receiver software interacts via I/O memory to read from and write to the correlator's 
registers. The software correlator uses an analogous strategy by implementing a shared 
memory buffer that both the correlator and the Cascade software can access. The memory 
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buffer is implemented using the {l mbufP driver included with RT-Linux. This driver is 
ideal for real-time situations since it allows sharing of memory between kernel modules and 
restricts the Linux kernel from swapping the shared memory space to disk. 

Thus, the software correlator has been designed as an independent module that interacts 
with other parts of the receiver according to well defined interface specifications. This 
modular approach provides flexibility in the internal workings of the receiver. One benefit 
of this modularity is that the mixing methods and correlation routines are transparent to 
the other standard software modules. This enables quick changes in correlator design that 
do not significantly affect other parts of the GPS receiver. 

5 Software Correlator Design 

Since a software correlator, as compared to a hardware correlator, does not process each 
channel in parallel the correlator calculations of a multi-channel software receiver represent a 
heavy computational burden. Therefore, it is important to explain the step-by-step process 
of software correlation. An outline of the software correlator's functions that need to be 
completed every millisecond is given below. 

1. Obtain the most recent carrier and code frequencies from the acquistion or tracking 
loops. 

2. For each channel, first mix the signal to base-band using the most recent carrier fre- 
quency and the accumulated carrier phase. Then, compute in-phase and quadrature 
prompt and early-minus-late correlations using the most recent code frequency and the 
accumulated code phase. 
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3. Store the prompt and early-minus-late Va and Q's for use by the acquisition or tracking 
loops. 

4. Repeat Steps 1-3 for each channel. 

5. If a measurement time (denoted as a TIC') occurred, then store the current mea- 
surement data including C/A code phases, epoch counters, carrier phases, and carrier 
Doppler shifts. 

6. Sleep for the remainder of the millisecond. 

An examination of the correlation timing requirements is in order. To correlate one 
millisecond of data on 12 channels, computations must be completed in less than one mil- 
lisecond. In order to leave computational time for other aspects of the GPS receiver, it is 
advisable to limit the processing of 12 channels to less than 750 microseconds. 

6 Mathematical Methods of Software Correlation 

A correlator has three main functions. First, it mixes a signal to base-band using the esti- 
mated carrier Doppler shift and carrier phase. Second, it mixes the base-band signal with a 
replica of the C/A code using the estimated code phase and code chipping rate. Third, it 
sums the resulting signal over a C/A code period. Since the received LI signal has an un- 
certain carrier phase, the correlator computes both in-phase and quadrature accumulations, 
as defined in equations (4) and (5). 
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6.1 Base-Band Mixing 

Base-band mixing is a multiplication of an input signal by a complex exponential where the 
frequency of the complex exponential approximately matches that of the input signal. The 
resultant signal is centered at base-band. A complex signal can be broken down into cosine 
and sine components, resulting in separate in-phase and quadrature components. 

In typical terrestrial, marine and aeronautical applications, the Doppler shift can vary 
over a ± 10kHz range about the intermediate frequency. If one wants to implement a phase- 
locked loop, then the frequency of the mixing signal must be controllable to within a few 
millihertz. Furthermore, the mixing signal must have a continuously varying phase. 

In a hardware correlator, local oscillators generate cosine and sine signals that have 
precise frequency control and a continuous phase. This strategy is not feasible for a software 
correlator. Generating cosine and sine signals on the fly with the correct frequency and 
phase would be too time consuming. Instead, the software correlator generates cosine and 
sine signals on a grid of frequencies off-line. These signals are stored in memory for later 
recall. 

A strategy is needed in order to minimize the number of sine and cosine signals that must 
be stored. The signals must be stored on a time grid of points sampled at the RF front-end 
sampling frequency of 5.714 MHz, and the signals must last for a C/A PRN code period, 
i.e., for 0.001 sec. It would take tens of gigabytes of memory or more in order to brute-force 
store all frequencies on a 1 mHz grid ranging from -10 KHz to +10 KHz, not to mention 
the question of storing a grid of possible starting phases at each frequency point. 

A method has been developed that allows the receiver to accumulate / and Q values 
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using stored carrier replicas that fall only on a rough frequency grid and that all start with 
a phase of zero. The rough frequency grid has a spacing of 175 Hz, and the resulting storage 
requirements are on the order of 323 Kbytes. The resulting accumulations are 



WA) = "E* V(U)C, fo.ooi ( V* fo )Whi /F - u> 8ik )(U - to 9jk )} 

Q«*(A) = - E vlM)Cj fo.001 ( V A -J ik )]sin[(u, lF - u, ajk ){U - t 0gJfc )] 
<=i* L \ T ;*+i _T j* /J 



(6) 



(7) 



where u gjk is the grid frequency that is closest to the estimated frequency CjDoppjk and where 
togjk is the time at which this carrier replica has zero carrier phase. These accumulations are 
then rotated in order to create accurate approximations of what would have been computed 
had the estimated carrier phase time history in equations (4) and (5) been used: 

I jk {A) = I gjk (A)cos(A<l> avgjk ) + Q gjk (A)sin(A<l> avgjk ) (8) 

Qjk{A) = -I gjk (A)sin(A<t> avgjk ) + Q gjk (A)cos{A<f> avgjk ) ^ 

where A<t> avg jk is the average phase difference between the grid carrier phase and the esti- 
mated carrier phase averaged over the accumulation interval: 

&<t>avgjk = Ugjk ^ ~ " " " ^ ) + ( 1Q ) 

The validity of equations (8) and (9) is dependent on the assumption that 
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1 — COS 



-(u) gjk - Uocppjk) (tjk+1 ~ Tjk 



«1 



(11) 



Given a 175 Hz grid spacing and a nominal C/A PRN code period of 0.001 sec, the maximum 
value on the left-hand side of inequality (11) is 0.04, which respects the assumed limit. 

Note that equations (8) and (9) can be derived from equations (4) and (5) as follows: 
First, one adds and subtracts the carrier phase of the grid signal in the arguments of the cosine 
and sine terms into sums of products of cosine and sine terms. Second, one uses trigonometric 
identities to split the cosine and sine terms into sums and products of cose and sine terms. In 
each product, one of the terms involves an argument like the arguments in the trigonometric 
terms in equations (6) and (7). The other trigonometric terms are then approximated by 
either cos(A<f> avgjk ) or sin(A<l> avg j k ). These approximations are valid because of the inequality 
in equation (11) and because the average of sin(v g j k - u)Doppjk)[ti - %{fjk + over the 

accumulation interval is zero. 

A decrease in C/No is expected from using an inexact frequency. The worst-case decrease 
is expressed as a function of the frequency grid spacing A/ and is given by 



where A/ is in units of Hz, and T is the integration period. Thus, a A/ of 175 Hz causes a 
worst-case SNR loss of 0.44 dB for T = 0.001 sec. 

The cosine and sine signals on the grid are stored with a 2-bit binary sign and magnitude 
representation. The format of this representatioi) is defined in Table 2. This format assumes 
that the cosine and sine signals have an amplitude of 2.4. Figure 3 shows how to sample a 
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theta (rad) 



Figure 3: Illustration of how to sample a sine wave for a 2-bit representation. 



sine wave to generate the optimal 2-bit representation, that has the minimum least square 
error. 



Sign Mag 


Value 


0 0 


-1 


0 1 


-2 


1 0 


+1 


1 1 


+2 



Table 2: Sign and magnitude combinations of the stored intermediate-frequency carrier sine 
wave. 



A simple EXCLUSIVE OR multiplication of sign bits and a redefinition of data bits 
accomplishes base-band mixing. Multiplication of the RF front-end output representation 
of Table 1 by the sine wave representation of Table 2 yields a result that can take on the 
values -6,-3,-2,-l,+l,+2,+3, and +6. These can be represented by 3 bits according to 
the scheme in Table 3. The high magnitude bit of Table 3 is simply the magnitude bit of the 
RF front-end output from Table 1, and the low magnitude bit of Table 3 is the magnitude 
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Sign 


High Mag 


Low Mag 


Value 


0 


0 


0 


+1 


fx 

0 


0 


1 


+2 


n 
u 


1 


n 

U j 


1 o 
"TO 


0 


1 


1 | 


+6 


1 


0 


0 


-1 


1 


0 


1 


-2 


1 


1 


0 


-3 


1 


1 


1 


-6 



Table 3: Sign, high-magnitude, and low-magnitude combinations of the base-band mixed 
signal 

bit of the base-band mixing sine wave from Table 2. Thus, these two magnitude bits are 
available without the need for computation. The sign bit can be computed by executing an 
EXCLUSIVE OR operation between the sign bits of the Table-1 RF front-end data and those 
of the Table-2 base-band mixing signal data. Notice how the sign bit value's relationship 
with the actual sign gets reversed from that of Tables 1 and 2. 

6.2 Mixing of the Base-Band Signal with a Local C/A Code 

Both prompt and early-minus-late correlations are needed to track the carrier frequency, 
carrier phase, and code phase in a GPS receiver. The prompt correlations are defined by 
equations (4) and (5) with A = 0. The early-minus-late correlations are Ijk{Aemi/2) - 
Ijk{-&emi/2) and (2j*(A cm i/2) - Q^(-A cm //2), where A cm j is the spacing between the 
early and late PRN code replicas. 

A hardware correlator generates in real-time a particular C/A code replica at the correct 
Doppler shifted frequency and phase. This approach is too time consuming in a software 
correlator. Instead, it is better to generate the C/A codes off-line and store the signals in 
a memory table. Storing all 32 C/A codes on a 2-dimensional grid of possible phases and 
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Doppler shifts would require a large amount of memory, on the order of several gigabytes. 

The required amount of storage can be greatly reduced by making two simplifications. 
First, the prompt code is stored as a single sign bit. This representation is shown in Table 4. 
The early-minus-late code, on the other hand, is stored in a two-bit representation (actually 
a 1.5 bit representation). It has a sign bit and a zero-mask bit, as denoted in Table 5. 



Sign 


Value 


1 
0 


+1 
-1 



Table 4: Sign bits of the prompt C/A code. 



Sign Zero Mask 


Value 


X 0 


0 


0 1 


-2 


1 1 


+2 



Table 5: Sign and zero mask combinations of the stored early-minus-late PRN code replica. 

The second simplification in the PRN code table is to ignore code Doppler shift variations. 
All signals in the table are assumed to have zero Doppler shift; i.e., all C/A codes in the 
table assume that f jJt+1 - fj k = 0.001 sec. The code phase errors due to this assumption are 
eliminated by choosing a replica code from the table whose midpoint occurs at the desired 
midpoint time (fj k + 7^+1 )/2. The only other effect of this assumption is a small correlation 
power loss, which is no more than 0.014 dB if the magnitude of the Doppler shift is less than 
10 KHz. 

The C/A code table must include a selection of phases as measured relative to the sample 
times of the RF front-end outputs. The particular RF front-end that has been used has a 
sample spacing of 175 nsec. The C/A code table includes 14 different phases with respect 
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to these sample times. This translates into a code phase spacing of 12.5 nsec, which equals 
a pseudorange measurement digitization level of 3.8 m. Thus, the maximum measurement 
error is half of this digitization level, or 1.9 m. 

The prompt and early-minus-late C/A code replicas can be mixed with the base-band 
code by bit re-definitions and a simple EXCLUSIVE OR operation. Suppose that one has 
a 3-bit base-band signal that is represented as in Table 3 and a prompt replica of the C/A 
code as represented in Table 4. Then the product of the two signals can be found by 
forming the EXCLUSIVE OR of the two inputs' sign bits to produce the sign bit of the 3-bit 
representation given in Table 6. The high and low magnitude bits of this mixed signal equal 
the high and low magnitude bits of the base-band signal from Table 3. Note that the Table 
6 representation is identical to that of Table 3 except for the inversion in the meaning of the 
sign bits. 



Sign 


High Mag 


Low Mag 


Value 


0 


0 


0 


-1 


0 


0 


1 


-2 


0 


1 


0 


-3 


0 


1 


1 


-6 


1 


0 


0 


+1 


1 


0 


1 


+2 


1 


1 


0 


+3 


1 


1 


1 


+6 



Table 6: Sign, high-magnitude, and low-magnitude combinations of the fully mixed prompt 
integrand. 

The mixing of the early-minus-late code with the base-band signal is also accomplished by 
an EXCLUSIVE OR operation on the two signals 1 sign bits in conjunction with a transcrip- 
tion of the high magnitude, low magnitude, and zero mask bits. The resulting representation 
is given in Table 7. 
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X 
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0 1 
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0 


0 


1 1 


\ -4 


0 


1 


0 1 


-0 


0 


1 


1 1 


-12 


1 


0 


0 1 


+2 


1 


0 


1 1 


+4 


1 


1 


0 1 


+6 


1 


1 


1 1 


+12 



Table 7: Sign, high-magnitude, low-magnitude, and zero-mask combinations of the fully 
mixed early-minus-late integrand. 

6.3 Bit- Wise Parallel Storage and Accumulations of Correlations. 

One can exploit the simple representations of signals in terms of 1 to 4 bits by using bit- wise 
parallelism to perform the necessary calculations. Bit-wise parallel operations work with 
representations of the data that store successive samples in successive bits of a word. For 
example, 32 samples of the RF front-end output are stored in 2 32-bit words. One word stores 
the 32 sign bits of the 32 samples, and the other word stores the 32 magnitude bits. The 
stored tables of the base-band mixing cosine and sine waves have their sign and magnitude 
bits stored in separate words, with each 32-bit word storing 32 sign or magnitude bits that 
tabulate to 32 successive samples of the corresponding cosine or sine wave. Similarly, the 
stored tables of the prompt and early-minus-late codes store sign or sign and zero-mask bits 
in words with each word storing 32 samples worth of data. By this means the EXCLUSIVE 
OR operations that are involved in mixing operate on 32 samples at a time because the 
processor has a bit-wise EXCLUSIVE OR command and other bit-wise commands that 
operate in parallel on each of two input arguments' 32 bit pairs. 

The final operation in the correlation calculations is to sum the results over all of the 
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samples in a given estimated PRN code period. This operation requires additional bit-wise 
parallel operations followed by operations that form totals over the bits in a given word. 
This approach starts by performing bit- wise parallel Boolean logic for each of the 8 possible 
values in the left-hand column of the prompt integrand representations in Table 6. Each 
such set of operations produces a value word with 0 bit values wherever the target integrand 
value is not the value of the integrand at the corresponding sample time and Ts wherever it 
is the integrand value. The 8 value words corresponding to the 8 possible Table 6 values are 
formed as follows: 

MINUSONE = NOT(SIGN) AND [NOT(HIGHMAG) AND NOT(LOWMAG)] (13) 
MINUSTWO = NOT{SIGN) AND [NOT(HIGHMAG) AND LOW MAG) (14) 
MINUSTHREE = NOT(SIGN) AND [HIGHMAGAND NOT(LOWMAG)] (15) 
MINUSSIX = NOT(SIGN) AND [HIGH MAG AND LOW MAG] (16) 
MINUSONE = SIGN AND [NOT(HIGHMAG) AND NOT(LOWMAG)] (17) 
MINUSTWO = SIGN AND [NOT(HIGHMAG) AND LOW MAG] (18) 
MINUSTHREE = SIGN AND [HIGH MAG AND NOT(LOWMAG)] (19) 
MINUSSIX = SIGN AND [HIGH MAG AND LOW MAG] (20) 

These operations can be carried out in 15 binary operations if one takes advantage of redun- 
dancy by storing common intermediate results. 

The operations for the early-minus-late integrand are similar. All of the values double in 
this case, i.e., the MINUSSIX word becomes the MINUSTWELVE word. Also, an additional 
AND operation must be performed with the ZERO MASK bits of Table 7 in order to mask out 
sample times when the early and late PRN codes cancel each other. If one takes advantage 
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of the fact that the eaxly-minus-late HIGHM AG and LOWMAG words are the same and if 
one ANDs the zero mask words with the SIGN and NOT(SIGN) words before ANDing the 
results with the HIGHMAG and LOWMAG results, then the early-minus-late integrands 
can be computed at a cost of only 11 additional binary operations. 

Additional zero-masking occurs in the first and last words of an accumulation interval. 
This is true because the start and stop times of an accumulation interval do not normally 
fall at the boundaries of data words. Therefore, the bits in the first word that precede the 
accumulation interval need to get zero-masked as do the bits in the last word that come after 
the end of the accumulation interval. 

The accumulation operation must sum the number of 1 bits in each of the 8 value words. 
These are no such summation operations in a standard microprocessor's instruction set. 
Therefore, the summations are accomplished using a table look-up. The value word is used 
as the address in the memory table, and the table's output is set up to deliver the number of 
1 values in the address. A 16-bit table has been used. This gives it a memory size of 2 16 or 
64 Kbytes, which makes it able to fit into the microprocessor's cache and allows for very fast 
execution. Suppose that this operation is called BITSUM. Then it can be used to compute 
the accumulation in equation (6) as follows: 
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I gjk (Q) = 6 * YlBITSUM^PLUSSIXji) - BITSUMiMINUSSIX^)] 
1=1 

AT* 

+3 * Y\BITSVM{PLUSTHREE j{ ) - BITSU M(MINUSTHREEji)] 

*. (13) 
+2 * ^{BITSUMiPLUSTWOjt) - BITSUM(MINUSTWO jt )] 
z=i 

+ Y\BITSVM{PLVSONE jl ) - BITSUM(MINUSONE jt )) 

The prompt quadrature and early-minus-late in-phase and quadrature accumulations can be 
computed using the same operations, but with differing value words that correspond to their 
respective integrands. 

6.4 Computation Time Savings 

The bit-wise parallel operations save computation time in comparison to integer mathemat- 
ical correlation operations. Integer mathematics requires 6 multiplications and 4 additions 
per sample (except for the last sample) in order to form the 4 required accumulations for 
each channel. At a sampling rate of 5.714 MHz this translates into 57136 integer operations 
per PRN code period. The bit-wise parallel method uses mostly simple logic and table look- 
up operations in order to form the 4 accumulations. It uses 6 EXCLUSIVE OR operations 
and 52 additional bit-wise logic operations per word. It uses 32 bit summation operations, 
and 32 additions per summation word (actually, it only requires 16 summations for the last 
word). Suppose that the nominal word length is 32 bits but that the summation words 
are only 16 bits long. Then there are 180 words and 360 summation words in a typical 
accumulation interval. If one totals the necessary operations along with some overhead that 
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occurs at the first and last words, then the new method requires 33496 operations per PRN 
code period. Thus, there is a savings of almost a factor of two in the operation count. The 
bit-wise method's logic and table look-up operations may execute more rapidly than mul- 
tiplication operations on a typical micro-processor, which would further increase the time 
savings. Additional speed-up may come about because of a reduced number of accesses to 
non-cache memory. The net speed-up is a factor of about 2.1 as measured on a 1.73 AMD 
Athlon GHz processor. 

Note that this algorithm can be adapted to work with a different number of bits in the 
representation of the RF front-end output and of the cosine and sine mixing signals. An 
increase above 2 bits will make the logic more complex and will decrease the time savings 
versus straight integer arithmetic. A decrease to a 1-bit representation will do the opposite. 
For example, if the RF front-end uses one-bit digitization rather than two-bit digitization, 
then the operation count will decrease by a factor of almost 2 for the new method, which 
will make it about 4.2 times faster than straight integer arithmetic. 

Another method of creating the carrier replicas exists. This method adds a small com- 
putational slow-down to the software, but reduces the number and length of signals stored. 
Instead of storing a millisecond of the carrier signals on a coarse grid of frequencies, it is 
possible to store only the values of cosine and sine over a period of 0 - 2tt. Then, to generate 
the carrier replica, one needs to compute the argument of the cosine and sine functions in 
equations (2) and (3). The 2n modulus of the argument is then used as the index into the 
stored cosine and sine signals. This step adds the extra requirement of computing the argu- 
ment in real-time, however, as compared to storing the signals ahead of time. If the cosine 
and sine arguments are specified as floating-point numbers, these computations produce a 
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significant slow-down. If the arguments are expressed as 64-bit integers, on the other hand, 
the overall computational time of the integer-based correlation algorithms frows by less than 
5% in comparison to the method that uses a pre-computed grid of carrier replicas. This 
method of generating the cosine and sine signals is not easily implemented in the bit-wise 
algorithms because of the additional cost of packing the representations into bit-wise parallel 
words after they get computed. 

6*5 Storage Requirements 

The pre-computed base-band mixing signals and PRN codes require a certain amount of 
memory. Each replica signal must occupy 180 32-bit words in order to be guaranteed to 
cover the full 5714 RF front-end samples that occur in one PRN code period for any possible 
code period start time within the 32 samples of the initial word. Thus, 180*4=720 bytes 
are required for each bit of each signal that must get stored. The sine and cosine waves 
each have two-bit representations, which translates into a storage requirement of 2880 bytes 
for the carrier replicas at a given Doppler shift. There are 115 Doppler shifts that must be 
stored in order to cover the -10 KHz to -f 10 KHz range with a 175 Hz grid spacing. This 
translates into 323 Kbytes of storage for all of the carrier replica signals. 

The pre-computed PRN codes also require a significant amount of storage. The prompt 
code has a 1-bit representation, and the early-minus-late code has a 2-bit representation. 
This translates into a total of 2160 bytes for a single code phase of a single PRN number. 
The table must include 14 different code phases per RF front-end sample multiplied by 32 
RF front-end samples per word, which yields a storage requirement of 945 Kbytes per PRN 
code and 30 Mbytes for all 32 PRN codes. 
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Note that it is possible to reduce these storage requirements by a factor of 32 if one does 
not store different code replicas for the 32 different possible locations within a data word of 
the first RF front-end sample at an accumulation interval. The memory savings comes at 
the cost of additional bit-shifting operations that are needed in order to 1-bit align the code 
replica's start bit with the estimated start bit in the incoming data word stream. Experience 
with the 1.73 GHz AMD Athlon processor indicates that this added computational cost is 
minimal. 

The micro-computer stores the most recent 21 msec of RF front-end data in a circular 
buffer. This allows it to process the differing code periods for different satellites during 
different iterations of a regularly scheduled program thread. This buffer occupies 30 Kbytes 
of memory. 

7 Code, Carrier Phase, and Carrier Frequency Mea- 
surements 

Navigation calculations require measured values of the PRN code phase, carrier phase, and 
carrier frequency. The measurements for each satellite must occur at the exact same time. 
The TIC function provides a periodic timing scheme to synchronize these measurements. 
When a TIC occurs the correlator latches all of the C/A code phases, carrier phases, and 
carrier frequencies along with the code epoch counters, and it makes these available to the 
remaining GPS receiver software. The GPS receiver uses the code phase and epoch counters 
to compute the pseudorange to each satellite. 

The software correlator keeps track of the code and carrier phase of each signal as deter- 
mined by the code chipping rate and the carrier Doppler shift inputs. Suppose that f C jk is 
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the receiver's estimated code chipping rate for satellite j during its k** 1 PRN code period and 
suppose that uooppjk is the associated carrier Doppler shift. f C j k will have been determined 
either by an acquisition search procedure, or if tracking, by a delay-locked loop. Likewise, 
&Doppjk will have been defined by an acquisition procedure or, if tracking has commenced, 
by a phase-locked loop or a frequency-locked loop. The software correlator uses these two 
quantities to update its self initialized code and carrier phases according to the formulas: 
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fj-jb+i = f jk + -j— (14) 

Jcjk 



In the software receiver a TIC occurs at the millisecond boundaries. At each TIC the 
code phase of each signal is computed in the following manner (referring to Figure 4): 

\Tj k +2-T jk +iJ (16) 

where ^tjc is the code phase in chips of signal j at TIC time tr/c« The epoch counters, 
which are simply a running total of the number of code periods, are incremented at each 
code start/stop time. 

The carrier phase calculation at the TIC time is similar to the code phase calculation: 

4>jTIC = + &Doppjk+l(tTIC - fj*+l) (jy) 

where <j> jTIC is the carrier phase at the TIC time of Figure 4. The Doppler shift that gets 
returned at this TIC time is UDoppjk+\. 
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code period 




code phase 









(PRN code dart/stop tune) 

Figure 4: A schematic diagram illustrating the code phase measurement 

8 Fixed-Point Computations 

Real-time software is fastest when using fixed-point computations. Floating-point opera- 
tions, such as addition, multiplication, and division, take much longer than their integer 
equivalents. For example, on an AMD Athlon one floating-point division takes 8 clock cy- 
cles while a fixed-point division takes 4-5 clock cycles. Furthermore, floating-point variables 
require more space in memory than do integer variables. Thus, it is worthwhile to carry out 
the majority of the calculations in an integer-based format. The mixing and correlations are 
already in integer format, but computations like calculating the code start/stop times and 
the angle of rotation for / and Q are inherently floating-point calculations. 

No process exists for blindly converting floating-point computations into fixed-point 
equivalents. Any reasonable process includes determining the maximum values and the 
desired minimum resolution of the calculations. These figures help to determine the required 
sizes of the integers. Maximum values are important because many parameters, such as 
the code start/stop times, increase continually over time, which makes overflow a concern. 
Resolution is important because the required accuracy of a computation must be maintained 
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when using an integer format. 

As an example, consider the C/A code start/stop times. The first step is to define how 
precise the times must be. This depends on how precisely the pseudorange must be measured. 
Assume that the pseudorange must be measured to within 0.5 meters or ~ 1.667 nsec. The 
maximum value for a 32-bit unsigned integer is 2 32 = 4 * 10 9 , which implies that a 32-bit 
representation of the code start/stop times would overflow in 7 seconds of operation. Since 
the code start/stop times continually increase over time, a 64-bit unsigned integer is more 
appropriate. A 64-bit integer also allows for an increase in the precision of the start/stop 
times. A good compromise between precision and avoidance of overflow is to count start/stop 
times in units of 100's of picoseconds. In this case overflow will occur after 58.5 years. 

Converting the / and Q rotation angle into a fixed-point equivalent is more complicated. 
This is so because of the continually growing nature of the carrier phase angle and because of 
the large intermediate frequency that gets multiplied by a time in the last term of equation 
(10). In the former case, however, modulo arithmetic is useful since A<f> augjk is modulo 2?r, 
and thus the terms that compose it can be computed modulo 2tt. 

9 Performance Results 

A sample screen-shot from the real-time software receiver is provided in Figure 8. This figure 
shows the receiver tracking 9 channels. The antenna used is an LI antenna with a pre-amp 
that has 26 dB of gain. It is mounted on the roof of Rhodes Hall at Cornell University. 
Position accuracy is on the order of 10-15 meters, which is is comparable to other receivers 
that use hardware correlators. 

The tracking loop performance of the software receiver code has been evaluated by cora- 
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Table 8: Screenshot of a the software GPS receiver. 

paring it to a software receiver implemented in MATLAB that uses smoother-based carrier 
and code tracking loops and that operates on the same data in an off-line mode. Figure 
5 compares the Doppler shift of the carrier from the real-time software receiver with that 
of the MATLAB smoother. The mean frequency error deviation after the transient period 
is less than 2Hz. Thus, the real-time software receivers FLL functions properly with the 
software-computed accumulations. 

It is important to compare the tracking and navigation performance between the software 
receiver and a receiver that uses a hardware correlator. A reciever that uses the Mitel GP2021 
digital hardware correlator was used for the comparison. This receiver uses the same Cascade 
GPS software as the software receiver. Both receivers also use the Mitel GP2015 RF front- 
end. The receivers are set up to run at the same time and are connected to the same 
roof-mounted antenna. Performing a simple side-by-side visual comparison shows that SNR 
values differ by less than 1 dB and that the navigation solutions differ by no more than 5-10 
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Figure 5: Frequency response of the real- time software receiver's FLL compared with that 
of a MATLAB smoother. 
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meters. 

An important measure of the efficiency of the algorithms used in the software correlator 
is the average duration per millisecond required to base-band mix and correlate 12 chan- 
nels. The current software correlator requires 390 microseconds, which is a 39% duty cycle. 
However, the algorithms are not optimized. We are aware of numerous increases in speed 
that will most likely reduce this value by about 5-10%. Furthermore, as faster PC's become 
available, the processing time will continue to decrease. 

It is important to compare the software receiver presented in this paper with the one 
presented in Akos et ai [2001a]. A proper comparison gives a sense of the efficiency of the 
algorithms implemented in this receiver. Akos et ai [2001a] show a plot of the required 
computation time for their 6-channel software receiver to process 1 second of GPS data as a 
function of x86 microprocessor speed. From the plot, the computation time of the receiver 
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at an RF front-end sampling frequency of 5.714 MHz on a 1.73 GHz processor is about 1 
second. The software receiver discussed in this paper processes 6 channels in about 0.23 
sec. Akos et al [2001b] mentions a potential speed improvement that may decrease the 
computation time by 30%. Taking this into consideration, the software receiver described in 
this paper is still over 3 times faster. 

Two different bit-grabbers have been tested with the real-time software receiver. The 
first one uses an analog down conversion scheme, while the other one implements a direct 
ADC down conversion. Previously, Akos and Tsui [1996] presented an implementation of a 
direct ADC down conversion GPS front-end. To evaluate the front-end, they stored and off- 
line processed 3 msec of GPS data. In contrast, the direct ADC front-end discussed in this 
paper has been tested with the real-time software receiver and ran continuously for several 
hours. The performance results are similar to those of the analog RF front-end. 

10 Summary and Concluding Remarks 

A 12-channel real-time software GPS Ll receiver that runs on a common PC has been 
implemented and tested. The hardware consists of an RF bit-grabber card, a data acquisition 
system, and a PC with a 1.73GHz AMD Athlon processor running RT-Linux. The software 
consists of the data acquisition code, the software correlator, and GPS software that provides 
the typical GPS functions such as navigation and tracking. The software correlator, running 
on the PC's processor, consumes about 39% of the CPU capacity, leaving the PC time to 
perform other tasks. Furthermore, optimizations exist that may decrease the CPU usage by 
about 5-10%. 

The software correlator algorithms have been tested in depth. They have been com- 
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pared to both a hardware correlator and a non-real-time software receiver implemented in 
MATLAB. These comparisons show that real-time software correlation can be implemented 
without loss of performance. 
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